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(54) A computer implemented method and system for information retrieval 



(57) A computer-implemented method and system 
for of retrieving information. A first file of information is 
received which includes a first markup language to iden- 
tify contents of the information. Responsive to the receiv- 
ing the first file of information, the first file of information 
is parsed to generate a list of profiles, and at least one 
corresponding topic for each of the list of profiles. A sec- 
ond file in a second markup language is created contain- 
ing the list of the profiles and at least one corresponding 
third file is created in a third markup language for the at 
least one corresponding topic for each of the list of pro- 
files. The second file contains anchors referencing each 
at least one corresponding third file, and first markup 
instances in the first file of information are converted to 
second markup instances in either the second file or the 
third file. The first file of information is parsed to deter- 
mine the at least one article, if any, for the each at least 
one corresponding topic for the each of the list of profiles, 
and a corresponding brief for the at least one article. A 
fourth file and a fifth file are generated for the at least one 
article, if any, for the each at least one corresponding 
topic for the each of the list of profiles. The fourth file 
includes a brief of each the at least one article in thef irst 
file of information and an anchor to the fifth file, the fifth 
file including text tor the at least one article, if any. for the 
each at least one corre^onding topic for the each of the 
list of profiles. In implemented embodiments, a sixth file 
can also be created which contains a plurality of anchors 
referencing a plurality of the fifth files, wherein the 
anchors in the sixth file are arranged by each the profile 
and corresponding topic. The first file of information can 
include receiving an electronic mail (e-mail) message. 
The processing of the first file, and the creation of the 



additional files with the included anchors allows hyper- 
text viewing and traversal of the processed information. 

In another embodiment, a computer-implemented 
method and system for retrieving information includes 
using a hypertext transfer protocol to display to a user a 
display generated from a first markup language, contain- 
ing a list a profiles, and at least one corresponding topic 
for each of the list of profiles. Searches may be per- 
formed on the information depending upon selected pro- 
files and topics, and entered keywords. 



=7= 



2S0 
KITP. 



4n 

/ 




HTML 




SCML/KTML 



Fig. 4 



Printed by BanK Xerox (UK) Business Services 
2.12.4A3.4 



EP 0 718 783 A1 



Description 

The present invention relates to a computer implemented method and system for information retrieva l. 

More specifically, the present invention relates to a client/server model for information retrieval which includes the 
5 receipt of a plurality of information organized by profile and topic in a first markup language , and the parsing of the 
plurality of Information into portions of information in a second markup language, including anchors referencing each of 
the portions of information to allow hypertext viewing and accessing. 

The development of computerized information resources, such as the Internet, and various on-line services, such 
as Compuserve, America Online, Prodigy, and other services, has led to a proliferation of electronically-available infor- 
10 mation. In fact, this electronic information is increasingly displacing more conventional means of information transmis- 
sion, such as newspapers, magazines, and even, television. The primary issue in all of these resources is filtering the 
vast amount of information which is available in order that a user obtain that information which is of interest to him. 

Currently, a user who subscribes to one of the above-services, or uses the Internet, in some circumstances, manually 
scans through the various information resources in order to obtain articles, postings, or other files which are of Interest. 
75 Typically, the user manually scans in areas or sources of interest (publications. USENET Newsgroups, fora, or other 
areas classified by topic) to find topic areas which may be of interest to the user. The user then retrieves articles or files 
which have subject headings, for example, matching those which the user wishes to read. For example, in USENET 
newsgroups on the Internet postings frequently have relevant subject headings to permit manual scanning. Article head- 
ings in newspapers/magazines serve similar functions. Thus, if the user does not wish to view those stories with products 
20 for sale, then he simply ignores those postings having the term "For Sale" In the subject heading. Hnally. as a last level 
of filtering, the user can read the stories which have been filtered by topic, and subject heading, and if anytime during 
the viewing of the story the user wishes to ignore the rest of the article, he can stop reading it and simply discard the 
story. If desired, the user can download the remaining stories in any number of prior art ways, for off-line viewing. 

As is clearly evident, this manual scanning process which a user must engage in is very time-consuming. To some 
25 extent, news is already filtered for readers. Those that subscribe to certain magazines/newspapers, or view certain 
television networks, already have the vast base of electronically-available information (e.g. wire services) filtered for 
them. However, this 'filtering" is performed on a large-scale basis, for a wide audience. The tailoring of specific stories 
of interest to particular users has not been performed. Thus, a need has arisen to automatically sort through the large 
variety of electronic sources in order to generate a subset of the stories available in electronic form which is tailored to 
30 a user's specific interests is desired. 

Other services have recently become available. Some allow the retrieval of information via facsimile or other means, 
based upon user-specified parameters. Certain commercial services now provide a facsimile service to retrieve articles 
of interest based upon predefined profiles. These profiles include those grouped by company or by subject area. The 
articles of interest are retrieved using standard prior art techniques (e.g. boolean search terms). These services do not 
35 to a large extent, however, provide the information in digital form (e.g. on a user-interface display), nor do they allow the 
browsing of information which is retrieved from these sources. 

Thus, the prior art of obtaining information from various sources suffers from several shortcomings. 

SUMMARY 

40 

A computer-Implemented method and system for of retrieving information. A first file of information Is received whi ch 
includes a first markup language to identify contents of the information . Responsive to the receiving the first file of 
information, the first file of information is parsed to generate a list of profiles, and at least one corresponding topic for 
each of the list of profiles. A second file in a second markup language Is created containing the list of the profiles and 

45 at least one corresponding third file is created in a third markup language for the at least one corresponding topic for 
each of the list of profiles. The second file contains anchors referencing each at least one corresponding third file, and 
first markup instances in the first file of information are converted to second markup instances in either the second file 
or the third file. The first file of information is parsed to determine the at least one article, if any, for the each at least one 
corresponding topic for the each of the list of profiles, and a corresponding brief for the at least one article. A fourth file 

50 and a fifth file are generated for the at least one article, if any, for the each at least one con-esponding topic for the each 
of the list of profiles. The fourth file includes a brief of each the at least one article in the first file of information and an 
anchor to the fifth file, the fifth file including text for the at least one article, if any. for the each at least one corresponding 
topic for the each of the list of profiles. In Implemented embodiments, a sixth file can also be created which contains a 
plurality of anchors referencing a plurality of the fifth files, wherein the anchors in the sixth file are arranged by each the 

55 profile and corresponding topic. The first file of information can include receiving an electronic mall (e-mail) message. 
The processing of the first file, and the creation of the additional files with the included anchors allows hypertext viewing 
and traversal of the processed information. 

In another embodiment, a computer-implemented method and apparatus for retrieving information includes using 
a hypertext transfer protocol to display to a user a display generated from a first markup language, containing a list a 
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profiles, and at least one corresponcfing topic for each of the list of profiles. The user is allowed to select via the hypertex t 
transfer protocol a first plura litv of the at least ^ne corresponding t ppic for a second plurality of the list of profiles. The 
u ser is allowed to enter via the hypertext transfer protocol at least one search term. A search is performed using th e 
search term in a first datab ase of information, the first database of information organized by the list of the proffles. an d 
£ the at least one con^espondinq topic for each of the list of profiles. T he search is performed in locations in the first 
database of information which correspond with each of the first plurality of the at least one corresponding topic for the 
second plurality of the list of profiles. Markup language files can be generated showing the results of the search. This 
search may include a FreeWAIS (inverted index) search of the parsed information. 

10 BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention is illustrated by way of example and not limitation in the figures of the accompanying in which 
like references indicate like elements and in which: 

15 Figure 1 shows a client/server system in which embodiments of the present invention may be implemented. 
Figure 2 shows a more detailed view of the client/server illustrated in Figure 1 . 
Figure 3 shows the structure of a workstation in which the client or server may be operative. 
Figure 4 shows a more detailed view of processes operative within the server. 

Figure 5 shows an example of files which are processed in implemented embodiments of the present invention. 
20 Figures 6a and 6b show more details of files which are created in the server. 

Figures 7a-7d shows a sequence of steps performed in the server for generation of a profile and/or creation of a 
personal newspaper. 

Figure 8 shows the process performed during a user session, including tiie performance of any searching. 
Figures 9-13 shows various display screens for user control and showing the results of the processes illustrated in 
25 Figures 5a-5d and 7. 

DETAILED DESCRIPTION 

A portion of the disclosure of this patent document contains material which is subject to copyright protection and to 
30 which a claim of copyright protection is made. The owner has no objection to the facsimile reproduction by anyone of 
the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves 
all copyrights and similar rights whatsoever. 

The present invention Is a method and apparatus for automatically creating hypertext document s from information 
using profiles and topics, and providing that information to a use r. Although the following will be described with reference 
35 to certain particular embodiments, including data structures, flow of steps, hardware configurations, etc.... it will be 
apparent to one skilled in the art that implementations of tiie present invention can be practiced without these specific 
details. 

Implementations of the present invention use a client/server architectur e, as illustrated in Figure 1 , wherein user 
requests 110 for news are sent by a client application program 1 00 to a server 1 50 (typically, a remote computer system 

40 accessible over the Internet or other communication medium). The server 150 . as will be described in more detail below. 
performs scanning and searching of raw (e.g. unprocessed) informatio n sources (e.g. newswires feeds or newsgroups). 
based upon these user reques ts, presents the filtered electronic information as server responses 160 to the client proc- 
ess. The client process may be active in a first computer system, and the server process may be active in a second 
computer system, and communicate with one another over a communication medium, thus providing distributed func- 

45 tlonality and allowing multiple clients to take advantage of the information gathering capabilities of the server. 

A more detailed view of the client and server are shown in Figure 2. Although the client and server are processes 
which are operative within two computer systems, tiiese processes being generated from a high level programming 
language (e.g. PERL), which is interpreted and executed in a computer system at runtime (e.g. a workstation), it can be 
appreciated by one skilled in the art that they may be implemented in a variety of hardware devices, either programmed 

50 or dedicated. 

Client 100 and server 150 communicate using the functionality provided by Hypertext Transfer Protocol (HTT P). 
The World Wide Web (WW W) or simply the "Web" includes all the servers adhering to this standarri which are accessin g 
t o clients via Uniform Resource Locators f URL'sV For example, communication can be provided over a communication 
medium 250. In some embodiments, the client and server may be coupled via Serial Line Internet Protocol (SLIP) or 
55 TCP/IP connections for high-capacity communication. Active within the client is a first process, known as a "browser" 
200. which establishes the connection with server 1 50. and presents information to the user. Any number of commercially 
or publicly-available browsers may be used, in various implementations, however in this implementation, browser 200 
is the Mosaic brand browser available from the National Center for Supercomputing Applications (NCSA) in Urbana- 
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Champaign, Illinois. Other browsers such as the Netscape, Netcruiser, or the Lynx brand browsers, or others, which are 
available and provide the functionality specified under HTTP and the Mosaic browser may be used. 

The server 150 executes the corresponding server software which presents information to the client In the form o f 
HTTP responses . The HTTP responses correspond with the Web "pages" represented using Hypertext Markup Lan- 
5 guage (HTML), or other data which is generated by the server, as will now be discussed. 

Under the Mosaic brand browser, in addition to HTML functionality 210 provided by the server (display and retrieval 
of certain textual and other data based upon hypertext views and selection of item(s)), a C ommon Gateway Interface 

L( CGI) 220 is prov id ed which allows the client program to direct the server to commence execution of a specified prog ram 
c ontained within tne serve r. This may include a search engine which scans received information in the server for pre s- 
10 entation to the user controlling the client . Using this interface, and HTTP, the server may notify the client of the results 
of that execution upon completion. 

In order to control the parameters of the execution of this server-resident process, the client may direct the filling 
/ out of certain "forms" from his browser. T his Is also provided by the "fill-in forms" functionality 23 0 available under Mosaic, 
which allows the user via his client application program 100, to specify search terms in which the server will cause an 
15 application program to function (e.g. terms contained in the types of stories/articles which are of interest to the user). 
The details of this searching mechanism will be discussed below. 

A computer system, such as a workstation, personal computer or other processing apparatus in which the client 
1 00 or server 1 50 may be operative is Illustrated In Figure 3. A workstation in which one implementation of the present 
invention may be practiced includes system 300. 300 comprises a bus or other communication means 301 for commu- 
20 nicating information, and a processing means 302 coupled with bus 301 for processing information. System 300 further 
comprises a random access memory (RAM) or other volatile storage device 304 (referred to as main memory), coupled 
to bus 301 for storing information and Instructions to be executed by processor 302. Main memory 304 also may be 
used for storing terr^orary variables or other intermediate information during execution of instructions by processor 302. 
System 300 also comprises a read only memory (ROM) and/or other static storage device 306 coupled to bus 301 for 
25 Storing static information and instructions for processor 302. and a data storage device 307 such as a magnetic disk or 
optical disk and its corresponding disk drive. Data storage device 307 is coupled to bus 301 for storing information and 
instructions. This may be used for storage of the various files to be described here including profile, indices, topics, and 
article files. 

System 300 may further be coupled to a display d^ice 321 , such as a cathode ray tube (CRT) or liquid crystal 
30 display (LCD) coupled to bus 301 for displaying information to a computer user. Such a display 321 may further be 
coupled to bus 301 via a frame buffer 310, which information such as a single or multiple frames or images for display 
upon display device 321 . An alphanumeric Input device 322, including alphanumeric and other keys, may also be coupled 
to bus 301 for communicating information and comnnand selections to processor 302. An additional user liput device 
is cursor control 323. such as a mouse, a trackball, stylus, or cursor direction keys, coupled to bus 301 for communicating 
35 direction Information and command selections to processor 302, and for controlling cursor movement on display 321 . 

Note, also, that any or all of the components of system 300 and associated hardware may be used In various embod- 
iments, however, it can be appreciated that any configuration of the system may be used for various purposes according 
to the particular implementation. 

In one embodiment, system 300 Is one of the Sun Microsystems® brand family of workstations such as the SPARC- 
40 Station brand workstation manufactured by Sun Microsystems® of Mountain View, California. Processor 302 may be 
one of the SPARC brand microprocessors manufactured by Sun Microsystems®, Inc. of Mountain View, California. 

Note that the following discussion of various embodiments discussed herein will refer specifically to a series of 
routines which are generated in a high-level programming language (e.g.. the PERL interpretive language) which is 
interpreted and/or executed in system 300 at run-time. These further are used in conjunction with the browser and server 
45 software available from NCSA, as described above, Including the specification of the appearance of displays in HTML 
It can be appreciated by one skilled in the art, however, that the following methods and apparatus may be implemented 
in special purpose hardware devices, such as discrete logic devices, large scale integrated circuits (LSI's). application- 
specific integrated circuits (ASIC's), or other specialized hardware. The description here has equal application to appa- 
ratus having similar function. 

50 Figure 4 shows in more detail the processes which form major components of implemented embodiments of the 
present invention. An SGML/HTML parser/converter process 400. implemented in PERL, communicates with the server 
via CGI 220, can process and respond to user requests for information, inctuding_§earc!ij:fiqu^^ 
resident u pon the server. In addition, a search result formatter process 430 may also create certain HTML files 420 
responsive to user search requests to perform a FreeWAlS search. The HTML veislons olanticles^stored from the SGML .. 

55 mall message are searched for the presence of specified search terms, and additional HTML files created responsive 
thereto. The HTML results of these search requests can then be displayed on the client's console. In implemente d 
embodimerrts. the HTML files 420 which are resident on the seryer.are generated from.an electronic mail (e-mail) me s- 
sage which is received from a Rpnnnri p rocess, the news source 410. This process may be operative on yet another 
computer system and communicate with the server over a networking medium. I n this imple mentation, the news source 
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embeds within the e-mail message. Standard Generalized Markup Language (SGML) text which helps to identify and 
^tggorize theJext^ This is illustrated In Rgure 5. This is one implementation only, however, and other sources (e.gT^ 
"uStNbi Newsgroups) may be used as a news source. In a 

In this implementation, a news source provides an e-mail message at some predetermined time period to the server 

5 210. and the HTML generator 400 parses the message, and creates HTML files which are made available during the 
client's session. The e-mail messap e -CQQtaiosjembedded SGML text, which includes profile/toDicfkev)^lnformation . Of 
course, the text may be received in any number of descriptive/formatting languages. A profile, in this implementation, is 
one or many topics. For example, a profile list file may be entitled ''Sun_Express_News.list" wherein a single profile 
name may be entitled "Direct Marketing." Keys (or topics) may be, for example. Telemarketing" and "Catalog Sales." In 

70 another implementation, a profilesAopic may include a USENET newsgroup and subject heading. Individual topics, in 
this implementation, are generated via a search of large numbers of publications using heuristic techniques to obtain 
the topics and group them by profile. This front-end searching and categorization into profiles and topics is performe d 
using a service available from Individual . Inc. Also, profiles are preset with specific topic(s) contained therein, although 
profiles and/or topics may be user -defined in other implementations. 

75 separate directory is used for storing eachjopic. As illustrated in Figure 5 and 6b. a directory 501 representing a a. 

profile may be created (named "Internet Watch" in the example) is created, if required. If it already exists, then subdi- JL'Jijif^ 
rectories (e.g. 502 and 503, named "Connectivity" and "Making Money") for the topic(s) contained within the profile also ^ 
are present (and created, if required). Within each subdirectory, files are created (e.g. 504) which contained the parsed 
articles themselves contained within the e-mail message 500. Two types of files are stored for each article: a brief of 

20 the article: and the article itself. Briefs (a.ka. abstracts) are used by the user during browsing of the results of the 
information retrieval to determine if a detailed review of the specific article is required. 

The various files created during SGML processing are shown in Figures 6a and 6b. In a specified directory (e.g. 
the root directory accessible via the Uniform Resource Locator [URL]) the html files index.html 610 and 
expandedjndex.html 620 are stored. The index.html file 610. contains a list of all the profiles which are currently defined 

25 (as received in the SGML file). The expandedjndex.html file 620. contains a list of the profiles along with their associated 
anchors referencing a list of abstracts (briefs) for each topic. 

Theses lists of abstracts are contained in key files (e.g. 620. 630), for each topic. The index.html file 610. expanded 
index.html 680. and key files are all created after parsing of the article files, wherein anchors are created in the various 
files in order to allows hypertext cross-referencing of the various related files and/or documents. As the SGML file is 

30 parsed, profile files (lists of topics) 640, 641 , 642, etc... are created in order to keep track of profiles. These are stored 
in a profiles directory 630. 

As shown in Figure 6b, a topics directory 650 references each of the topics, stored as directories 660, 661 , 662, 
etc... In each topic directory (e.g. 660), a key file 670 is stored which contains, by topic, references to each of the articles 
(e.g. 673) contained within the directory. As will be illustrated below, the key file 670 contained titles represented as 

35 anchors to the articles themselves, and associated abstracts (from the brief files - e.g. 672) stored in the directory. Each 
brief file, such as 672, also contains an anchor to the article file. Lastly, summary files, such as summary*.html 671 , are 
stored in the directory which contain a previous weeks' summary of titles, represented as anchors, of articles stored in 
the topic directory. Summary files are stored with the file specification summary<date#>.html, wherein date# is a julian 
date for a previous week's date. Of course, any unique file specification may be used. 

40 The SGML file is processed twice to obtain relevant information. First, it is parsed to obtain the articles and briefs 
for each article. According to which profiles/topics the articles are relevant to each article and brief, directories, if required 
are created. The articles and briefs are then stored in to these subdirectories. A second pass of the profile and topic 
subdirectories causes the linkage of the index.html, expandedjndex.html, key.html, and article html files for each topic 
for hyperlink cross-referencing. The details and mechanics of this are discussed. below. 

45 
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An example of a story contained within an SGML formatted message is shown as follows: 



<rec> 
<title> 

BEST PRODUCTS ANNOUNCES OCTOBER SALES 
</title> 

<date> 1 1-0 3-94<date> 
<datel> Nov* 3, 1994</datel> 
<brief> 
<paragraph> 

Best: Products Co., Inc. today reported sales for fiscal 
October, the four weeks ended October 29, 1994, increased 
2.4% to $102.0 million compared to $99,6 million for the 
four weeks ended October 30, 1993. 
< / par agr aph> 
</brief> 

<source>Business Wire</source> 
<f ileid>bll03123 . 200</f ileid> 
<text> 
<paragraph> 

RICHMOND, Va. — (BUSINESS WIRE) via First i — Best 
Products Co., Inc. (Nasdaq:BEST) today reported sales for 
fiscal October, the four weeks ended October 29, 1994, 
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increased 2.4% to $102.0 million compared to $99.6 million 
for the four weeks ended October 30, 1993. Comparable 
5 Store sales increased 0.8% 

for the same period. < /par agraph> 
<paragraph> 

Fiscal year-to-date sales increase 4.1% to $913. 7 million 
for the 39 weeks ended October 29, 1994 compared to $882.7 
million for the same period the prior year. Comparable 
store year-to-date sales for 1994 increased 3.4% to $886.7 
million compared to $857.7 million for the same period 
during 1 9 9 3< /paragraph> 
<paragraph> 

Chief Executive Officer Stewart M. Kasen said, "October 
sales strengthened during the second half of the month and 
met the company's expectations. Jewelry sales continued 
their positive trend, along with those of housewares and 
home furnishings. "< /par agraph> 
^ <paragraph> 

Best Products, the nation's second largest catalog 
showroom retailer, operates 160 Best showrooms and 12 Best 
Jewelry stores in 22 states. The company also operates a 
2^ nationwide mail-order service. </paragraph> 

<paragraph> 

804/261-2150 

212/850-5600< /par agr aph> 
30 </text> 

<catalog items> 

<key>Direct Marketing</key> 

<key>Retail Department Stores</key> 

<key>Retail Dist. Overview</key> 
35 <profile>Sun Express News</prof ile> 

<key>Direct Market ing</ key > 

<key>Catalog -Sales</key> 

< /c a tal og_i terns > 

</rec> 



After parsing to create the profile list , the SGMLf ile is parsed again to otain the brief and story files. HTML Instances 
are substituted for SGML instances. Other corresponding formatting commands and/or codes may be used which cor- 
45 respond with those in the source language. In addition, an anchor is created referencing the article file itself, in order to 
allow hypertext browsing from the brief to the article itself. Anchors implemented using any number of hypertext retrieval 
techniques, and HTML is only discussed as one Implementation. A brief file has a file name with the format: 

(yr-day#.rec_brf__story#.html> (e.g. 94-307.rec_brf_72.html) 
Wherein day# is the jullan calendar date for the year, and 8tory# is a unique number assigned to the story as It was read 
50 from the SGML file. 

An example of a brief file is as follows: 
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<dl><dtXA HREF="94-307.rec_15.html">BEST PRODUCTS 
ANNOUNCES OCTOBER SALES 
</A> 
<dd> 

Best Products Co., Inc. today reported sales for fiscal 
October, the four weeks ended October 29, 1994, increased 
2.4% to $102.0 million compared to $99.6 million for the 
four weeks ended October 30, 1993. 
<p> 
</dl> 



An article file is similarly created wherein SGML instances are replaced by their HTML equivalents. An article file 
has a file name with the format: 

<yr-day#.rec_story#.html> (e.g. 94-307.rec_72.html) 
An example of an article or story file is as follows: 



<htinl> 
<title> 

BEST PRODUCTS ANNOUNCES OCTOBER SALES 
</title> 
<hl> 

BEST PRODUCTS ANNOUNCES OCTOBER SALES 
</hl><p> 

<pre> 
Date: 11-03-94 
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Source: Business Wire 
FilelDL 51103123.200 
^ </pre><p> 
<body> 

RICHMOND^ Va. — (BUSINESS WIRE) via First! — Best 
Products Co., Inc. (Nasdaq: BEST) today repored sales for 
fiscal October, the four weeks ended October 29, 1994, 
increases 2.4% to $102.0 million compared to $99-6 million 
for the four weeks ended October 30, 1993. Comparable 
store sales incresed 0.8% for the same period. <p> 

Fiscal year-to-date sales increased 4.1% to $918.7 
million for the 39 weeks ended October29, 1994 compared to 
$882.7 million for the same period the prior year. 
Comparaible store year-to-date sales for 1994 increased 
3.4% to $886.7 million compared to $857.7 million for the 
same period during 1993. <p) 

^ Chief Executive Officer Stewart M Kasen said, "October 

sales strengthened during the second half of the month and 
met the company ' s expectations . Jewelry sales continued 
their positive trend, along with those of housewares and 
home furnishings. "<p> 

25 Best Products, the nation's second largest catalog 

showroom retailer, operates 160 Best showrooms and 12 Best 
Jewelry stores in 22 states . The company also operates a 
nationwide mail-order service •<p> 

30 804-261-2150 

212/850-5600<p> 
</body> 
-< I — -Start - Catalog — > 
<pre> 

35 Profile: Sun Express News 

Topic: Direct Marketing 
Topic: Catalog Sales 
</pre> 

<1 — End Catalog — > 
40 </html> 



Details of a process performed fo r creating the prof He. lis t (and corresponding HTML file), topic ^ summarY^ article 
45 and brief files in implemented embodiments will now be discussed with rerference to Figures 7a- 7d. The process is 
implemented as an interpretative PERL script, and is Idle until the expiration of a predetermined time period, step 702 
(e.g. 15 minutes). Upon the expiration of this time period, it is determined whether the server has received a new mail 
message containing the SGML formatted text, step 704. If so. then processing of the mail message can proceed. 
At step 706. the mail message is copied to a local working directory for additional processing. The file has a name 
so with the format: 

<yr-day#.sgml) (e.g. 94-307.sgml) 
The file is then parsed at step 710 to remove articles, obtain prof iles/kevs^ and create any topic directories . jf required . 
Subsequent to'parsing of the sgml file and creation of html tiles, the directories are post processed at step 712 in order 
to create the homepage and expanded index, and provide linkage among the various article, brief, summary, topic, index. 
55 and expanded index files. Finally, at step 714. the FreeWAIS Indices, used for performing keyword searching are updated 
to allow user searching of the profiles and topics. 

Figure 7b shows more details of the step 710 shown in Figure 7a. At step 730, it is determined whether there are 
any more artlde files contained within the SGML file. If not (e.g. an end-of-flie condition is detected), then the process 
Is complete. If so, however, then the next article file, induding the profile/key information is read at step 732 (e.g. the 
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next instance of (reo). Stories are delimited by <recx/rec). Then, the profile/key information contained within the article 
can be processed. 

As shown in the example SGML file above, profiles are delimited by the expressions (profilex/profile). Topics are 
delimited by the expressions <key) (/key) and follow the profile name. As shown above, profiles and keys (topics) follow 

5 each article, and are delimited by the expressions (catalog_items> (/catalogjtems). If there are no more profiles/topics 
for the article, as detected at step 734, then process 710 proceeds to step 736. If there are more profiles, then it is 
detected at step 736 whether a profile file exists for the profile. If not, then a profile file is created at step 738. Then, at 
step 740, it is detected whether a topic subdirectory exists for the topic. If not, then it is created at step 742 and a entry 
is added to the profile file listing the topic in the profile file. Once the entry has been added to the profile file, and the 

10 topic subdirectory created, if required, then the article file is written at step 744, replacing all SGML instances with HTML 
instances. The brief file is similarly written. As shown in the example above, the brief file is delimited by (brlefx/brief). At 
step 748, it is detected whether any more topics defined for this profile. If so, process 710 proceeds to step 740. If not, 
It proceeds to step 734 for retrieval of the next profile in the article. As is evident from the flowchart, several copies of 
the article and corresponding brief may be created, however, as an optimization, links only may be stored In the directories 

IS to a single copy of the brief and corresponding article. Other modifications would be apparent to one skilled in the art. 

Figures 7c and 7d show the details of the post processing step 712. Step 712 is performed upon completion of 
processing of the article files from the e-mail message. Upon entry into step 712, complete profiles and associated topics 
are known. Thus, the lndex.html and expandedjndex.html files may be created at step 750. Anchors to the brief files 
and the con-esponding articles occurs within the process at step 752. At step 754 it is determined whether there are any 

20 more topic subdirectories. If so, then the next topic subdirectory Is retrieved at step 756. The key.html file (for referencing 
the articles) is created at step 758. The process then continues in figure 7d. At step 762 in figure 7d, it is determined 
whether there are any more brief files contained in the subdirectory. If not, then the process returns to step 754 of figure 
7c. If so, then the next brief file is retrieved at step 764. The corresponding article file is then retrieved at step 766. The 
anchor to the article can then be added to the brief file at step 768. The brief file, including the anchor can then be 

25 inserted into the key.html file at step 770. This allows hypertext viewing of the article from the brief or key.html files via 
a browser. Upon completion of this processing, full hypertext browsing among the index, expanded index, topic, brief, 
or article files may then be accomplished. 

Rgure 8 shows a process for displaying/searching using the above-created HTML files, and additional files which 
can be created on demand, after execution of a search. A session typically commences when the HTTP server detects 

30 a request for a client connect, step 802. Then, the requested page, typically the home page, is displayed on the client 
browser at step 804. Any other requested page can be displayed using normal HTTP exchanges, including, the profile 
list (home page), the expanded profile list which includes topics arranged by profile, or any selected profile, topic, brief 
or story page using the above-created HTML pages. In addition, the server detects whether a search has been requested 
at step 806. If so, then a FreeWAIS (inverted index) search is performed, at step 808, in the selected profiles, if any. At 

35 step 810. HTML page(s) containing the results of the search are generated, including story brief, and titles, organized 
by relevancy. Relevancy may include any number of factors, including but not limited to, primacy, and number of occur- 
rences, in this order. Other, more sophisticated, relevancy measures may also be used. The story headings are displayed 
as anchors which reference the stories themselves. Upon completion of the search, the request page(s) are again 
displayed, in this case the search results, at step 804. 

40 Rgures 9a and 9b shew a scrollable display which is displayed as the main home page HTML file. A search field 
910, which allows user Input at any time using fill-in forms support under HTTP Is displayed for FreeWAIS searching at 
any time in the selected prof ile(s). Each of the profile names (e.g. 920, 921 , etc.) are selectable as anchors as profiles 
for the search or for referencing the expanded home page (Rgure 10). 

The expanded home page display is shown in Figure 10. This includes profile headings (e.g. 1020, 1030) and 

45 corresponding topic anchors (1021, 1022, 1023, and 1031) for access to the topic main pages. Again, the expanded 
home page is searchable by entering text into field 1010. 

Figures 1 1 a-1 1 c include a scrollable display which shows the Topic Summary page accessible by selecting any of 
the topic anchors shown in Figure 10. This includes the topic heading 1 100, and anchor 1 101 to the previous topic, if 
any, for the profile. This also includes a next topic anchor 1 102, and a return to profile page for returning to the home 

50 page shown in Figure 9. Anchors for story headings 1111,1112, etc. .. are present for access to the stories themselves. 
The display also includes the brief for each story (e.g. 1121). A previous days headings only (1 130. 1 131, etc..) are 
shown and allow access to those stories, as shown in Figure 11b. Anchor 1 1 40 of f igure 11c allows the access of an 
HTML page for a previous week's stories for the given topic, if any. 

Figures 12a and 12b show the results of a FreeWAIS search, as described above. The search term(s) are shown 

55 in field 1 200. and story headings (e.g. 1 202), which act as anchors to the story pages themselves, are shown sorted by 
relevancy. The relevancy Score, including the date, is shown as 1201 for each article. The text from the brief HTML file 
is shown as 1203. 

Lastly, story text is displayed on a page such as that shown in Figure 13. 1300 includes the story title, 1301 includes 
identifying information regarding the story, and 1302 shows the story text itself. 
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Thus, a method and apparatus for a retrieving information has been described. Note that though the foregoing has 
particular utility and has been described with reference to certain specific embodiments in the figures and the text, that 
one may practice the present invention without implementing all of these specific details. Thus, the figures and the text 
are to be viewed in an illustrative sense only, and not limit the present invention. The present invention is only to be 
5 limited by the appended claims which follow. 

Claims 

1 . A computer-inplemented method of retrieving information comprising the following steps: 

10 

a. receiving a first file of information which includes a first markup language to identify contents of said info r- 
mation ; 

b. responsive to said receiving said first file of information, parsing said first file of information to generate a lis t 
of profiles, and at least one corresponding topic for each of said list of profiles ; 

15 c. generating a second file in a second markup language containing said list of said profiles and at least one 

corresponding third file in a third markup language for said at least one corresponding topic for each of said list 
of profiles, wherein said second file contains anchors referencing each at least one corresponding third file, and 
first markup instances in said first file of information are converted to second markup instances in either said 
second file or said third file; 

20 d. parsing said first file of information to determine said at least one article, if anv. for said each at least one 

con-esponding topic for said each of said list of profiles , and a corresponding brief for said at least one article; and 
e. generating a fourth file and a fifth file for said at least one article, if any. for said each at least one corresponding 
topic for said each of said list of profiles, said fourth file including a brief of each said at least one article in said 
first file of information and an anchor to said fifth file, said fifth file including text for said at least one article, if 

25 any, for said each at least one corresponding topic for said each of said list of profiles. 

2. The method of claim 1 further comprising the step of generating a sixth file which contains a plurality of anchors 
referencing a plurality of said fifth files, wherein said anchors in said sixth file are an^anged by each said profile and 
corresponding topic. 

30 

3. The method of claim 1 wherein said step of receiving said first file of Information includes receiving an electronic 
mail (e-mail) message. 

4. The method of claim 1 further comprising the step of creating a directory for each of said list of said profiles and at 
35 least one subdirectory for sakJ at least one corresponding topic for each of said list of profiles, and storing said third 

and fifth file for said at least one article, if any. for said each at least one con^esponding topic for said each of said 
list of profiles in said sutxJirectory. 

5. The method of claim 1 wherein said generating said at least one corresponding third file for said at least one cor- 
40 responding topic for each of said list of profiles, includes generating a first level heading for each of said list of 

profiles, and generating a second level heading for said at least one con^esponding topic for each of said list of 
profiles. 

6. The method of claim 5 wherein said second level heading for said at least one corresponding topic for each of said 
45 list of profiles comprises an anchor referencing said fifth file. 

7- A computer-implemented method of retrieving information comprising the following steps: 

a. parsing a first file of information to generate a list of profiles, and at least one conresponding topic for each 
50 of said list of profiles; 

b. generating a second file containing said list of said profiles and said at least one corresponding topic for each 
of said list of profiles; 

c. parsing said first file of information to determine said at least one article, if any for said each at least one 
corresponding topic for said each of said list of profiles; 

55 d. generating a third file for said at least one article, if any, for said each at least one corresponding topic for 

said each of said list of profiles, said third file including text for said at least one article, if any, for said each at 
least one corresponding topic for said each of said list of profiles; and 

e. inserting an anchor in said second file referencing said third file for said at least one article, if any, for said 
each at least one con-esponding topic for said each of said list of profiles. 
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8. The method of claim 7 further comprising the step of receiving said first file of information in an electronic mail (e- 
mail) message. 

9. The method of claim 7 wherein said inserting an anchor in said second file referencing said third file for said at least 
5 one article, if any, for said each at least one corresponding topic for said each of said list of profiles includes gen- 
erating a first level heading for each of said list of profiles, and generating a second level heading for said at least 
one corresponding topic for each of said list of profiles, wherein said second level heading includes a first anchor 
referencing said third f9e. 

10 10- A computer-implemented method of retrieving information comprising the following steps: 

a using a hypertext transfer protocol, displaying to a user a display generated from a first markup language, 
containing a list a profiles, and at least one corresponding topic for each of said list of profiles; 

b. allowing a user to select via said hypertext transfer protocol a first plurality of said at least one corresponding 
15 topic for a second plurality of said list of profiles; 

c. allowing said user to enter via said hypertext transfer protocol at least one search term; and 

d. executing a search using said search term in a first database of information, said first database of information 
organized by said list of said profiles, and said at least one corresponding topic for each of said list of profiles, 
wherein said search is performed in locations in said first database of information which correspond with each 

20 Of said first plurality of said at least one corresponding topic for said second plurality of said list of profiles. 

11. The method of claim 10 further comprising the step of displaying, via said hypertext transfer protocol , results of 
said search using said search term in said first database, said results of said search including a first display generated 
from a first markup language file representing said results of said search, said first markup language file including 

25 at least one anchor, wherein said each at least one anchor references a second file containing text which includes 
said search term for each article for one of said first plurality of said at least one corresponding topic for said second 
plurality of said list of profiles. 

1 2. The method of claim 10 wherein said step of executing said search using said search term in sakj first database of 
30 information includes performing a FreeWAIS search of said first database. 
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Intel_Ne ws 

PrevT^js Topic I^^Jjext^opic RWTP P? Pi?fils±age 



12/13/94 .cr-—- 



l^eplaceme nt Pentiums / 1 3-1 

Hewlett-Packard Co. said Wednesday it will provide rcplacemenX 
microprocessors for customers who have technical problems with the Pentium 
chip used in Hewlett computers. 

Intel hire^ eyterminators to kill Pentium bug ^ 

SAN JOSE, California — Intel Corp. said it will work with software experts 
to short— circuit the bug in its Pentium microprocessor. Richard Wirt» an 
Intelnd the error. 

»***Internet Backlash On Pentium Flaw 

Intel CorpK>ration» which makes the Pentium processor, is being besieged by 
angry buyers after an American mathematician discovered a flaw in the chip's 
"floating point unit.*' 

Pentium Jokes from the Internet 

Intel's Pentium woes have spawned a batch of new jokes on the Intemet and 
other online systems. A sampling: Q: What is the successor to RU-486 birth 
control drug? 

San Jose Mercury News. li^,, l^f ^ " Gillmor Column 

PENTIUM ISSUE HAS BUGGED LOTS OF READERS: Want to get lots of 
mail, phone calls and faxes? Take one of Silicon Valley's major icons to task 
in print. 

When Customer Complaint Hits the Intemet. Even Intel Must Listen 

Thomas Nicely just couldn't make the numbers work. On June 13. after 
running billions of calculations on his Penuum computer, the Lynchburg 
College math professor discovered the niunber^ didn't divide right. 

Cyrix says courts rule in favor vs Intel 

Cyrix Corp said that a district court has ruled that International Business 
Machines Corp may maniifacture microprocessors for Cyrix under IBM's 
patent cross license with Intel Corp . 

Intel share s off on IBM decision to halt 

Intel Corp shares fell sharply in heavy trading after International Business 
Machines Corp announced it would halt shipments of persona] computers 
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made with Intel's Pentium chip, analysts said. 

Dell bas no plans to halt ship ping Pentium 

Dell Computer Coip said it has no plans to halt shipments of its 
Pentiiun-based personal computers because of the previously announced flaw 
in the chip and it added thai demand for Pentium-based PCs remains strong. 

Intel CEO to speak to anaJvsts 1700 EST 

Intel Corp chief execudvc officer Andrew Grove will hold a telephone 
conference with analysts and some reporters at 1700 EST/2200 GMT, 
Intel said. 

CompUSA PentjuTP P^ sales not vet hit by flaw 

CompUSA Inc said it has not demand for its Pentium-based PCs remains 
heavy, with sales apparently unaffected by recent reports of the flaw in the 
Intel Corp chip. 

Intel does not see a _04 charge from flaw 

Intel Corp does not expect to Cake a charge against fourth-quarter earnings 
because of any repercussions from the flaw discovered in its Pentium chip, 
said Andrew Grove, chief executive officer. 

IBM HALTS SHIPMENTS OF PENTTUM-B ASED PER SONAL COMPUTERS 

BASED ON COMPANY RESEARCH 

IBM HALTS SHIPMENTS OF PENTIUM-BASED PERSONAL 
COMPUTERS BASED ON COMPANY RESEARCH- IBM today 
announced it has stopped shipment of all IBM personal computers based on the 
Intel Pentium microprocessor. 

Intel savs IBM shipment halt is unwarranted 

In response to an IBM press release^ Intel reiterated that it has studied the 
Pentium processor flaw for months and has concluded that the frequency of 
encountering reduced precision in floating point divide operations is once in 
every 9 billion random divide operations. 



12/12/94 

• Intel A^^ ^tmces New Chip Plant -^^1 13 0 

• Va. Professor Found Chip Error ^ [{^ f 




• INTEL TARGETS PENTni M AT EUROPEAN HOME EDUCATION 
MARKET 

• PC vendors grapple with FPU sunnnrr - Manv telling custnmers to caU Intel 




27 



EP 0 718 783 A1 



• wrsP24T delayed 

• Intel Improves Window s Suppon in Fax Boards 

• Intel Track Record Suf fers Another Biow 

• Software Vendors Look To ReoelPe ntium FPU Bu^ 



I Previous Week's News ^ ' 



Cacjxiciit br First: 



28 



EP 0 718 783 A1 



Fig 



newsfeed 



Enter keyword(s): 



newsfeed contains the foUowktg items relevant to 'soiaris': 

0: Score=1000 Date=9/16/94 f >€»/ , ^ 

Sevbold - SunSoft In Document Viewer Pea] . 

From Seybold *94, Electronic Book Technologies and SunSoft have 
announced an agreement to create a universal document viewer for ^^T^/ J^y^ 
Solaris users. 

1; Score=:1000 Daie=9/7/94 

SunSoft ships Netware Protocol Service for Solaris 

SunSoft Inc. today began shipping PC Protocol services , a set of 
services for IPX/SPX compatible-access into the Solaris operating 
environment. 

2: Score=994 Date=:9/16/94 

Seybold - SunSoft In Document Viewer Deal 

From Seybold '94, Electronic Book Technologies and SunSoft have 
announced an agreement to create a universal dociunent viewer for 
Solaris users. 

3: Score«990 Date=:9/7/94 

Sunsofr ^hin^ Netware Protocol Service for Solaris 

SunSoft Inc. today began shipping PC I^xnocol services . a set of 
services for IPX/SPX compatible-access into the Solaris operating 
environment. 

4: Score=990 Date;=9/7/94 

SunSoft ships Netware Protocol Service for Solaris 

SunSoft Inc. today began shipping PC Protocol services . a set of 
services for IPX/SPX compatible-access into the Solaris c^radng 
environment. 

5: Score=990 Date==9/7/94 
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<;»n5n ft <hips Netware Prn tocol Service for Solari? 

SunSoft Inc. txxlay began shipping PC Protocol services , a scl of 
services for EPX/SPX corEq>atible-access into the Solaris operaong 
envlronmcat. 

6: Scorc=975 Datc«ll/15/94 

pn^AMnRR .STINSnFrPRHSIP F NT TAT.KS ABOIIT QBJBCTS . 

After releasing Solaris 2,4 last month, SunSoft Inc. is racing to beat 
competitors thai are implementing object-oriented technology in 
operating systems. 

7: Scote=972 Datc^/7/94 

SunSoft ships Netware Pio tocol Servncc for Solaris 

SunSoft Inc. today began shipping PC Protocol services . a set of 
services for IPX/SPX compatible-access into the Solaris operating 
environment. 

8: Scorc=957 Date=ll/15/94 

KD ZANDER. SUNSOFT PRESIDFNT TALKS AT^OT IT OBJECTS 

After releasing Solaris 2.4 last month. SunSoft Inc. is racing to beat 
competitors that are implementing object-oriented technology in 
operating systems. 

9: Score=957 Daie=ll/22/94 

PT .ASMON PORTS MANAGER TUPFP. 05;^; 

Piasmon Data Systems, of Milpitas, CA, is making its MANAGER 
jukebox integration and management software available for Windows 
for Workgroups, OS/2, and Solaris environments. 

10: Score=939 Daic«9/13/94 

/Vyt^dahl ajid S un Deliver High-Perfo rmance Enhancements for Solaris, 

Amdahl Corporation and Sun Microsystems Computer Company today 
introduced the first product created as a result of the Amdahl-Sun 
strategic alliance. 

11: Score=939 Date=9/13/94 

Amdahl and Sun Deliver H i gh -Perform ance Enhancements for Splflrig 

Amdahl Corporation and Sun Microsystems Computer Company today 
introduced the first product created as a result of the Amdahl-Sun 
strategic alliance. 
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UK - NEC, VideoLogic Team On 3D Games ' 

Date: 12-16-94 

Source: Newsbytes < I3t5j 

Fileld: nl216161.805 > 



LONDON, ENGLAND, 1994 DEC 16 (NB) via First! — NEC of Japan and 
VideoLogic in the UK have teamed up to develop a new three— dimensional (3D) video 
graphics system that they claim will revolutionize" video games and desktop multimedia 
fC systems. 

The technology, which has been developed so far by VideoLogic and is known by the 
name Power VR (Power Virtual Reality), is being licensed to NEC so that both 
companies can develop firmware capable of supporting 3D and VR technology for the 
consumer marketplace. VideoLogic officials say that tne technology should arrive in the 
marketplace by the end of 1995. 

According to VideoLogic, PowerVR has been in development for more than three years 
and provides true photo quality pictures in animated form, giving the user a sense of 3D 
reahsm and special effects that no other computer systems can generate. 

One of the most interesting features of PowerVR is that the graphics technology is 
scalable, meaning that even lowly 32-bit games systems can produce the effect, 
although, obviously, not at the same level of resolution as on dedicated 64-bit systems 
against which the PowerVR system is aimed. 

VideoLogic officials say that making realistic computer games is only one step on the 
ladder of technology as far as PowcrvR is concerned. They point out that industrial 
applications, such as "walk through" VR building designs and even VR cities on 
computer could be sculpted using the PowerVR technology. 

"The ultimate aim in the games market, and, in the PC applications market, is to increase 
the level of realism and interactivity. The fruits of our developments with NEC will move 
games and other PC virtual reality applications into new realms of quality," explained 
Derek MacLaren, VideoLogic 's chairman, announcing the NEC hccnsirig deal. 

"Our goal is to remove the * virtual' from virtual reality so that users of our technology 
can feel like they are actually in Jurassic Park rather than just watching the movie," he 

said. 

(Steve Gold/1 9941216/Press Contact: Andrew Smith, A Plus Group 444- 753-790700; 
Reader Contact: Kazuya Yoshida, NEC +44-71-353-4383; Neil Davison, VideoLogic 
+44-923-260511) 

Profile : Multimedia 

Topic: Virtual Reality 
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